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l Background of the Invention 

2 

3 1. Field of Invention 
4 

5 This invention relates to data storage systems. 

6 

7 2. Related Art 

s y 

9 yi Many computer applications need to store and retrieve information. 

10 V} Information can be stored on hard disks, floppy disks, CD-ROMs, semiconductor RAM 

n ^ memory and similar storage devices. Many of these storage systems are susceptible to data 

12 p loss of various forms including disk failures. A solution to the problem of disk failure 

13 =1 involves use of a RAID (redundant array of independent disks) system. One style of RAID 

14 systems uses multiple hard drives to store parity data generated from the data drives, either 

15 on a separate drive (known as the parity disk) or spread out among the multiple drives. The 

16 use of multiple hard drives makes it possible to replace faulty hard drives without going off- 

17 line; data contained on a drive can be rebuilt using the other data disks and the redundant 

1 8 data contained in the parity disk. If a hard drive fails, a new hard drive can be inserted by 

19 "hot-swapping" drives while on-line. The RAID system can rebuild the data on the new disk 

20 using the remaining data disks and the redundant data of the parity disk. The performance of 
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1 a RAID system is improved by disk striping, which interleaves bytes or groups of bytes 

2 across multiple drives, so more than one disk is reading and writing simultaneously. Files 

3 are broken into chunks of data known as file blocks and these file blocks are stored in one or 

4 more physical sectors of one or more hard disks. Each file block is a given size such as 

5 4,096-bytes that takes up 8 sectors. 
6 

7 A first known problem with storage devices is that they are susceptible to data 

8 corruption. This data corruption includes bit flips, misdirected I/O, lost I/O, sector shifts, 

9 and block shifts. One style of RAID uses parity data to determine whether there has been 

10 f- n corruption of some data included in a disk stripe. Parity is checked by comparing the parity 

1 1 s value stored on disk against the parity values computed in memory. Parity is computed by 

12 O taking the exclusive-OR (henceforth "XOR") of the blocks in the data stripe. If the stored 

13 ^ and computed values of parity are not the same, the data may be corrupt. If a single disk 

14 block is incorrect, the RAID system includes enough data to restore the corrupted block by 

15 recalculating the corrupted data using the parity data and the remaining data in the data 

16 stripe. However, such RAID systems can not determine which disk includes the corrupt data 

17 from parity values alone. Although parity data is useful in determining whether corruption 

18 has occurred, it does not include enough information to restore the corrupted data. 

19 Moreover, it is unclear which data has been corrupted. 
20 
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1 Checksums are another form of redundant data that can be written to individual 

2 disks. The combination of parity bits across the disks along with checksums and their 

3 associated information may include enough information so that the corrupted data can be 

4 restored in RAID and other redundant systems. 

5 

6 A second known problem involves using a sector checksum for each sector of 

7 data. A sector checksum is generated for each collection of data that can fill a sector. The 

8 yi data is stored in a disk sector, along with the associated sector checksum. Some known 

9 'ii systems include reformatting a collection of hard disks from standard sector sizes such as 

10 f r! 512-byte sectors to include sector checksums in each sector such as reformatting to 520-byte 

1 1 s sectors. Data corruption in the disk sector can then be detected by using the sector checksum 

12 y because the stored checksum would not match a computed checksum. However, data 

13 ?i corruption such as sector slides, misdirected reads and writes, and lost sectors would not be 

14 detected at the disk sector level. For this type of corruption, a checksum computed from the 

15 sector data would match the stored checksum. 

16 

17 A third known problem is storing checksums in reserved locations separate 

18 from the associated data. A separate read or write operation of the checksum is required for 

19 every read or write operation of the associated data. This can result in performance loss in 

20 some workloads. 
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1 Accordingly, it would be advantageous to provide an improved technique for 

2 the error checking and correction of data storage systems. This is achieved in an 

3 embodiment of the invention that is not subject to the drawbacks of the related art. 
4 

5 Summary of the Invention 

6 

7 The invention provides an improved method and apparatus for a reliable data 

8 yj storage system using block level checksums appended to data blocks. 

9 Si 

10 f fj In a first aspect of the invention, a block-appended checksum is created at the 

1 1 s filesystem block level, where a block is the filesystem's unit of transfer. In a preferred 

12 ^ embodiment, the data storage system is a RAID system composed of multiple hard disk 

13 f i drives, including a parity disk drive and a controller for the drives. Files are stored on hard 

14 disks in storage blocks, including data blocks and block-appended checksums. The block- 

15 appended checksum includes a checksum of the data block, a Virtual Block Number (VBN), 

16 a Disk Block Number (DBN), and an embedded checksum for checking the integrity of the 

17 block-appended checksum itself. The block-appended checksum reliably detects corruption 

18 of data within a sector such as bit flips, as does a sector checksum. However, a block- 

19 appended checksum also reliably detects data corruption across sectors including sector 

20 slides, misdirected reads and writes, and lost sector I/O. 
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1 The combination of (1) parity bits across the RAID system stored on the parity 

2 disk, (2) the remaining uncorrupted data in the data disks, and (3) block-appended checksum 

3 within each disk includes sufficient information so as to enable detection and restoration of 

4 corrupt data in RAID systems and other similar devices. Such a combination is preferable to 

5 using block-appended checksums alone because block-appended checksums are limited to 

6 detecting errors. 

7 

8 [J? In a second aspect of the invention, a file system includes file blocks with 

9 a associated block-appended checksum to the data blocks. The file blocks with block- 

10 Ji; appended checksums are written to storage blocks. In a preferred embodiment a collection 

11 I' of disk drives are formatted with 520 bytes of data per sector instead of the more commonly 

12 O found 512 bytes per sector. For each 4,096-byte file block, a corresponding 64-byte block- 

13 y appended checksum is appended to the file block. When this is written to disk, the first 7 
14 " sectors includes most of the file block data while the 8 th sector includes the remaining file 

15 block data and the 64-byte block-appended checksum for a total of 4,160-bytes of data. 

16 Because the block-appended checksums are appended to the file blocks, every read or write 

17 to a storage block includes the reading or writing of the file block and the block-appended 

18 checksum in a single operation. In known cases, this results in greatly improved 

19 performance. 

20 
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1 In a third aspect of the invention, I/O operations are first stored in NVRAM 

2 (non-volatile random access memory). In the event of a system crash, I/O operations are 

3 replayed from NVRAM, which preserves file block data. When the I/O operation is 

4 performed again, the corresponding block-appended checksum information is simply 

5 recalculated. 

6 

7 In a preferred embodiment, the invention is operative on a RAID level 4 

8 O system for a file server. However, in other embodiments, the invention is applicable to any 

9 ft computer data storage system such as a database system or a store and forward system such 

10 01 as cache or RAM. 

11 3 



12 Srj Brief Description of the Drawings 

13 O 

14 Figure 1 shows a block diagram of a reliable, redundant data storage system 

15 including block-appended checksum. 

16 

17 Figure 2 shows a flow diagram of a method for writing file blocks with block- 

18 appended checksums to a reliable, redundant data storage system. 
19 

20 Figure 3 shows a flow diagram of a method for reading data from data storage 
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blocks including file blocks with block-appended checksums in a reliable, redundant data 
storage system. 

Detailed Description of the Preferred Embodiment 

In the following description, a preferred embodiment of the invention is 
described with regard to preferred process steps and structures. Those skilled in the art 
would recognize after perusal of this application that embodiments of the invention can be 
implemented using elements adapted to particular process steps and structures described 
herein, and that implementation of the process steps and structures described herein would 
not require undue experimentation or further invention. 

Incorporated Disclosures 

The inventions described herein can be used in conjunction with inventions 
described in the following applications: 

• Application Serial Number 09/642,063, in the name of Blake LEWIS, Ray CHEN and 
Kayuri PATEL. filed on August 18, 2000, Express Mailing Number 
EL524781089US, titled "Reserving File System Blocks", assigned to the same 
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1 assignee, attorney docket number 103.1033.01, and all pending cases claiming the 

2 priority thereof. 

3 

4 • U.S . Patent Application Serial No. 09/642,062, in the name of Rajesh SUNDARAM, 

5 Srinivasan VIS WANATHAN, Alan ROWE, Steven R. KLEIMAN and John 

6 EDWARDS filed August 1 8, 2000, Express Mail Mailing No. EL524780242US, 

7 titled "Dynamic Data Space", assigned to the same assignee, filed August 1 8, 2000, 
8 attorney docket number 1 03 . 1 034.0 1 , and all pending cases claiming the priority 

9 a thereof. 

10 ]i: 

1 1 ! ' • Application Serial Number 09/642,066, in the name of Ray CHEN, John EDWARDS 

12 Q and Kayuri PATEL filed on August 18. 2000, Express Mailing Number 

13 i2 EL524780256US, titled "Manipulation of Zombie Files and Evil-Twin Files", 

14 assigned to the same assignee, attorney docket number 103.1047.01, and all pending 

15 cases claiming the priority thereof. 

16 

17 • Application Serial Number 090/642,065, in the name of Doug DOUCETTE, Blake 

1 8 LEWIS and John EDWARDS, filed August 1 8, 2000, Express Mailing Number 

19 EL52478 1 092US, titled "Improved Space Allocation in a Write Anywhere File 

20 System", assigned to the same assignee, attorney docket number 1 03 . 1 045 .01, and all 
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1 pending cases claiming the priority thereof. 

2 • Application Serial Number 09/642,061 , in the name of Blake LEWIS, John 

3 EDWARDS, and Srinivasan VISWANATHAN, filed on August 1 8, 2000, Express 

4 Mailing Number EL524780239US, titled "Instant Snapshot", assigned to the same 

5 assignee, attorney docket number 1 03 . 1 03 5 .0 1 , and all pending cases claiming the 

6 priority thereof. 

7 

8 S • Application Serial Number 09/642,064, in the names of Scott SCHOENTHAL, filed 

9 d August 1 8, 2000, Express Mailing Number EL52478 1 075US, titled "Persistent and 

10 5 Reliable Delivery of Event Messages", assigned to the same assignee, attorney docket 
n ! number 103.1048.01, and all pending cases claiming the priority thereof. 

12 D 

13 1= Lexicography 

14 ~~ 

15 As used herein, use of the following terms refer or relate to aspects of the 

16 invention as described below. The general meaning of these terms is intended to be illusory 

17 and in no way limiting. 



18 

19 • Sector - In general, the term "sector" refers to a physical section of a disk drive 

20 including a collection of bytes, such as 5 12 or 520 bytes. This is the disk drive's 
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1 minimal unit of transfer. 

2 • Storage block - In general, the term "storage block" refers to a group of sectors, such 

3 as 8 sectors or 4,096 bytes for 512 byte sectors or 4,160 bytes for 520 byte sectors. 

4 

5 • Data block - In general, the term "data block" refers to a collection of bytes stored in 

6 a storage block, such as 4,096 bytes with 5 1 2-byte sectors or 4, 1 60 bytes with 520- 

7 byte sectors each stored in 8 sectors. 

8 O 

9 ^ • Block-appended checksum - In general, the term "block-appended checksum" refers 

10 m to a collection of bytes, such as 64 bytes, which may include a checksum of a data 
1 1 En block, a Virtual Block Number (VBN), a Disk Block Number (DBN), and an 

12 h embedded checksum for checking the integrity of checksum information itself. 

13 in 

14 u • Stripe - In general, the term "stripe" refers to the collection of blocks in a volume 

15 with the same DBN on each disk. 

16 

17 • Volume - In general, the term "volume" refers to a single file system spread across 

18 multiple disks and associated disk drives. Known data storage systems have current 

19 size limits, such as greater than one terabyte and are included in multiple volumes. 

20 
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1 • DBN (Disk Block Number) - In general, the term "DBN" refers to a location of a 

2 particular block on a disk in a volume of a file system. 

3 

4 • VBN (Volume Block Number) - In general, the term "VBN" refers to an integer that 

5 maps to a disk number and disk block number. 

6 

7 • WAFL (Write Anywhere File Layout) - In general, a high level structure for a file 

8 r J system that is above RAID in hierarchy and includes metadata, such as one or more 

9 £ri copies of the "fsinfo block" (file system information block) located at fixed locations 

10 on disk. Pointers are used for locating the remaining data. All the data except the 

1 1 r fi fsinfo blocks are collected into files and these files can be written anywhere on the 

12 H disk. 

13 \% 

14 Q • Parity checking - In general, the term ''parity checking" refers to an error detection 

15 technique that tests the integrity of digital data within a computer system or over a 

16 network. Parity bits are checked by comparing them against computed values of 

17 parity, which are the XOR of the sets of data bits. 

18 

19 • Parity disk - In general, the term "parity disk" refers to a separate disk drive that 

20 holds parity bits in a disk array, such as four data disks and one parity disk in a 



Express mailing EL524780565US Page 12 



103.1049.01 

1 volume of a data storage system. 

2 • Parity protected - In general, the term "parity protected" refers to protection of a 

3 collection of data using parity bits. Data is parity protected if it has parity for an 

4 entire collection of data. In a preferred embodiment, parity computations can be 

5 made across bytes. 

6 

7 • Checksum - In general, the term "checksum" refers to a value used to ensure data is 

8 O stored or transmitted without error. This value is created by calculating the binary 

9 ^ values in a block of data using some algorithm and storing the results with the data or 

10 ? 'fk at a separate location. When the data is retrieved from memory, received at the other 

1 1 ol end of a network, or retrieved from a computer storage system, a new checksum is 

12 I* computed and matched against the existing checksum. A mismatch indicates an error. 

13 ln s 

14 £3 As described herein, the scope and spirit of the invention is not limited to any 



15 of the definitions or specific examples shown therein, but is intended to include the most 

16 general concepts embodied by these and other terms. 

17 

1 8 System Elements 

19 

20 Figure 1 shows a block diagram of a reliable, redundant data storage system 
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1 including block-appended checksums. 

2 A data storage system 100 includes a controller CPU (central processing unit) 

3 105, an I/O port 1 10, a file system 1 15, a RAID system 125, a disk driver 135, a host/disk 

4 adapter 145, a hard disk collection 150, including drive 155, drive 160, drive 165, drive 170 

5 and parity drive 1 75 . 

6 

7 A data storage system 100 is part of a larger computer system. The I/O port 

8 o 1 10 is connected to the larger computer system in such a way that the controller CPU 105 

9 ^ can send data to and from the I/O port 110. The data is written to and read from the hard 

10 m disk collection 150, including a parity disk 175 in a data storage system 100. 

11 P 

12 Unlike other parity systems that may require breaking up the bytes in a block 

13 I S of data or breaking up the block of data itself, each bit in the parity block is computed using 

14 O the corresponding bits in the data blocks. Thus, if there are four blocks of data, one block 

15 would be put on a first drive 155, the second block would be put on drive 160, the third 

16 block would be put on drive 165 and the fourth block on drive 170. The parity block is 

17 computed using an XOR of the data blocks. 

18 

19 In a preferred embodiment, the five disk drives 155, 160, 165, 170 and 175 in a 

20 RAID system 125 include one or more volumes. A volume is a single file system in a data 
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1 storage system. Each block has a unique VBN (volume block number) and DBN (disk block 

2 number). The VBN specifies the location of a block in a volume. The DBN specifies the 

3 location of a block in a disk. Therefore, more than one block can have the same DBN if they 

4 are in the same location on different disks. However, only one block can have a given VBN. 

5 

6 Known data storage systems format hard disks with 512-bytes per sectors. 

7 Prior art systems with checksums may include disks formatted with 520-byte sectors 

8 o comprising 512-bytes of file block data and 8-bytes for a sector checksum. In a preferred 

9 ^ embodiment, each disk in a hard disk collection 150 is formatted with 520-bytes per sector. 

10 f s Files are broken into fixed sizes of data known as file blocks. These file blocks are stored in 

1 1 m one or more physical sectors of one or more hard disks such as 4,096-bytes that take up 8 

12 ^ sectors. With a hard disk formatted to 512-byte sectors, the file block fits into 8 sectors with 

13 j n no extra bytes remaining. With a hard disk formatted for 520-bytes per sector, the 4,096- 

14 o byte_file block fits into the 8 sectors with 64 bytes free for a block-appended checksum. The 

15 first 7 sectors contain only file block data while the 8 sector includes the remaining file 

16 block data and ends with the 64-byte block-appended checksum. This 520-bytes per sector 

17 formatting approach allows the file block and checksum to be written or read in a single 

18 operation. The resulting block-appended checksum has an advantage over the prior art 

19 sector checksums in a 520-byte formatted hard disk because it can reliably detect sector data 
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1 corruption such as sector slides, misdirected reads and writes, lost sectors and similar 

2 defects. 

3 In a preferred embodiment, a series of software and hardware layers is required 

4 for reading and writing data between the CPU 105 and the hard disk collection 1 50. A file 

5 system 1 1 5 takes a relatively large data file and divides it into a group of file blocks of a 

6 given size such as 4,096-bytes. A RAID system stripes these file blocks across a collection 

7 of hard disks such as a hard disk collection 150 including four data disks, disk 1 155, disk 2 

8 g 160, disk 3 165 and disk 4 170 plus a parity disk 175 that provides redundancy for the data 

9 yi High performance is accomplished using the RAID system by breaking the group of file 

10 blocks into four sub groups and striping these sub groups of blocks in parallel across the data 
n : ' disks. Each file block in a RAID system receives a block-appended checksum of a given 

12 h size such as 64-bytes. The block-appended checksum is appended to the file block to 

13 W produce a file block with a block-appended checksum of 4,160-bytes. The block-appended 

14 u checksum information includes at least: a 4-byte checksum of the data block; a Virtual Block 

15 Number (VBN); a Disk Block Number (DBN); and a 4-byte embedded checksum for 

16 checking the integrity of the block appended checksum itself. Other embodiments may use 

17 other formats of data and algorithms other than Adler's. A sector checksum and a block- 

18 appended checksum reliably detect bit flips. However, only a block-appended checksum 

19 reliably detects sector data corruption including sector slides, misdirected reads and writes, 

20 and lost sectors. 
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1 

2 In a preferred embodiment, the file system 115 allocates a collection of 4,096- 

3 byte buffers for each file block when writing a stripe of blocks to the hard disk collection 

4 150. Each file block has the same DBN in a stripe provided the hard disk collection 1 50 is 

5 composed of equal sizes of hard disks. Each 4,096-byte file block 1 20 is written to a 4,096- 

6 byte buffer and sent to RAID 125. In RAID 125, each 4,096-byte buffer is appended with 

7 64-bytes for a total block size of 4, 1 60 bytes to accommodate the 64-byte block-appended 

8 ^ checksum 130. The I/O operations are logged to NVRAM. If the system crashed after this 

9 -d point, the file blocks can be restored upon recovery from the crash by replaying the log of 

10 J I/O operations from NVRAM. Each 4,096-byte file block plus 64-byte checksum 140 is sent 
n ^ = to the disk driver 135. The disk driver 135 creates a scatter/gather list that provides 

12 £3 instructions where host/disk adapter 145 should distribute each file block plus 64-byte 

13 y checksum 140. The collection of buffers and the scatter/gather list are sent to the host/disk 
14 adapter 145. The host/disk adapter 145 then writes the stripe of file blocks with the block- 

15 appended checksums to the hard disk collection 150 including the four hard disks, disk 1 

16 155, disk 2 1 60, disk 3 1 65, disk 4 1 70. The parity data is created from the stripe of file 

17 blocks and it is written onto the parity disk 1 75. A file block with block-appended checksum 

18 1 80 is written to a storage block on disk 1 155 that is composed of 8 sectors of the disk. 

19 There is a single operation for writing the file block with appended checksum. The file 

20 block data fills all 8 sectors with space remaining in the last part of the last sector to hold the 
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1 block-appended checksum. When a file is read, each file block and associated block- 

2 appended checksum is also done as a single operation. The stored block-appended checksum 

3 is compared with a computed block-appended checksum to validate the data. If the stored 

4 and computed block-appended checksums are not equivalent, the data has been corrupted 

5 and must be rebuilt using the remaining hard disks including the parity disk 175 in the hard 

6 disk collection 150. 

7 

8 ^ Method of Use 

9 m 

10 Cn Figure 2 shows a flow diagram of a method for writing file blocks with block- 

11 i|i appended checksums to a reliable, redundant data storage system. 

12 M: 

13 rU A method 200 is performed by the data storage system 100. Although the 

14 U method 200 is described serially, the steps of the method 200 can be performed by separate 

15 elements in conjunction or in parallel, whether asynchronously, in a pipelined manner, or 

16 otherwise. There is no particular requirement that the method 200 be performed in the same 

17 order in which this description lists the steps, except where so indicated. 
18 

19 At a flow point 205, the data storage system 100 is ready to perform the 

20 method 200 to a file system 115 including writing file blocks and block-appended 

Express mailing EL524780565US Page 18 



103.1049.01 

1 checksums. In the preferred embodiment the write method 200 requires formatting hard 

2 disks to 520 byte sectors. 

3 

4 At a step 210, the data storage system 100 receives a request from the user to 

5 write a file block to the file system 115. 

6 

7 At a step 2 1 5, the data storage system 1 00 allocates and fills a 4,096-byte 

8 ^ buffer with a file block. 

9 la 

10 En At a step 220, the data storage system 100 sends the filled 4,096-byte buffer to 
nf RAID 125. 

12 o 

13I/I At a step 225, the data storage system 100 allocates a 64-byte buffer in RAID 

14° 125. 

15 

16 At a flow point 230, the data storage system 1 00 computes a block-appended 

17 checksum for the 4,096-byte file block in the 4,096-byte buffer, fills the 64-byte buffer with 

18 the block-appended checksum and appends the 64-byte buffer to the 4,096-byte buffer. 

19 

20 At step point 235, the data storage system 100 sends the 4,096-byte file block 
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1 buffer including the 64-byte block-appended checksum buffer to the disk driver 135. 

2 At a step point 240, the data storage system 1 00 creates a scatter/gather list 

3 using the disk driver 135 to distribute the 4,096-byte file block with appended checksum to a 

4 group of sectors making up a storage block on one or more of the disks in the hard disk 

5 collection 150. 
6 



7 At a step 245, the data storage system 100 sends the 4,096-byte buffer, 

8 O including the appended 64-byte buffer and the scatter/gather list to the host/disk adapter 145. 
9 

10 en At a step 250, the data storage system 100 writes the file block with the block- 

1 1 E-n appended checksum to a storage block in a single operation. 

12 

13 m At a step 255, the data storage system 100 completes writing to one or more of 

14 £3 the hard disks in the hard disk collection 150. 

15 At a step 260, the data storage system 100 frees up the 64-byte buffer in RAID 

16 125. 

17 

18 At a flow point 265, the data storage system 100 has succeeded or failed at 



19 writing a file to the file system. 

20 
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1 Figure 3 shows a flow diagram of a method for reading data from data storage 

2 blocks including file blocks with block-appended checksums in a reliable, redundant data 

3 storage system. 

4 

5 A read method 300 is performed by the data storage system 100. Although the 

6 read method 300 is described serially, the steps of the read method 300 can be performed by 

7 separate elements in conjunction or in parallel, whether asynchronously, in a pipelined 

8 ^ manner, or otherwise. There is no particular requirement that the read method 300 be 

9 in performed in the same order, in which this description lists the steps, except where so 
io indicated. 



11 tn 



12 M At a flow point 305, the data storage system 100 is ready for requests to read 

13 r y file blocks from a file system 115, including reading file blocks and block-appended 

14 checksums. 

15 At a step 310, the data storage system 100 receives a request from the user to 

16 read a file block to the file system. 115. 

17 

18 At a step 3 15, the data storage system 100 allocates a 4,096-byte buffer for a 

19 file block. 

20 
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1 At a step 320, the data storage system 100 sends the empty 4,096-byte buffer to 

2 RAID 125. 

3 

4 At a step 325, the data storage system 100 allocates a 64-byte buffer in RAID 

5 1 25 and appends it to the 4,096-byte buffer. 

6 

7 At a flow point 330, the data storage system 1 00 sends the 4,096-byte file 

8 r 3 block buffer with the 64-byte block-appended checksum buffer to the disk driver 135. 

9 yi 

io m At a step point 335, the data storage system 100 creates a scatter/gather list in 

n £[1 the disk driver 135 to collect the 4,096-byte file block from a group of sectors making up a 

12 o storage block on one or more of the disks in the hard disk collection 150. 

13 W 

14 O At a step 340, the data storage system 1 00 sends the 4,096-byte buffer with the 

1 5 appended 64-byte buffer along with the scatter/gather list to the host/disk adapter 1 45 . 

16 

17 At a step 345, the data storage system 1 00 reads the file block with the block- 

1 8 appended checksum from a storage block in a single operation. 

19 

20 At a step 3 50, the data storage system 1 00 completes reading the file block 
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1 with the block-appended checksum from one or more of the hard disks collection 1 50. 

2 At a step 355, the data storage system 100 computes the block-appended 

3 checksum for the file block and compares it to the appended block-appended checksum to 

4 verify the file block. 

5 

6 At a step 360, the data storage system 100 frees up the 64-byte buffer in RAID 

7 125. 

8 £3 

9 Hi At a flow point 365, the data storage system 1 00 has succeeded or failed at 

10 en reading a file block from the file system 115. 

11 Pi 

12 S Alternative Embodiments 

13 in 

14 Q Although preferred embodiments are disclosed herein, many variations 

15 are possible which remain within the concept, scope, and spirit of the invention, and these 

1 6 variations would become clear to those skilled in the art after perusal of this application. 
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i Claims 

2 

3 1 . An apparatus including a mass storage device, said mass storage device 

4 having a plurality of sectors, said apparatus including 

5 a plurality of storage blocks, each said storage block including a plurality of 

6 said sectors; 

7 wherein each said storage block includes a data portion and an error code 

8 O portion; 

9 ^ wherein said data portion is responsive to data for said data block; and 

10 ffi wherein said error code portion is responsive to data for a plurality of said 

1 1 P sectors in each said storage block. 

12 ]Z 

13 [S 2. An apparatus as in claim 1, wherein said mass storage device includes 

14 Q one or more hard disks. 

15 

16 3. An apparatus as in claim 1, wherein said mass storage device includes a 

17 RAID storage device. 

18 

19 4. An apparatus as in claim 3, wherein said RAID storage device is a 

20 RAID level 4 device. 



Express mailing EL524780565US Page 24 



103.1049.01 

1 5. An apparatus as in claim 1 , wherein said error code portion is appended 

2 to said data portion. 

3 

4 6. An apparatus as in claim 1 , wherein said error code portion includes a 

5 checksum of the said data for said data block. 

6 

7 7. An apparatus as in claim 6 ? wherein said checksum of said data for said 

8 O data block includes 4-bytes of checksum data. 

9 y : 

10 f rj 8. An apparatus as in claim 6, wherein said checksum is a block-appended 

1 1 P checksum. 

12 jZ 

13 j 5 9. An apparatus as in claim 8, wherein said block-appended checksum 

14 u includes a checksum of said block-appended checksum. 

15 

16 1 0. An apparatus as in claim 9, wherein said checksum of said block- 

17 appended checksum includes 4-bytes of data. 

18 

19 1 1 . An apparatus as in claim 1 , wherein said mass storage device includes a 

20 cache or RAM. 
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1 1 2 . An apparatus as in claim 1 , wherein said mass storage device includes 

2 one or more hard disks formatted with 520-bytes per sector. 

3 

4 1 3 . An apparatus as in claim 1 , wherein said plurality of said sectors is eight 

5 sectors. 

6 

7 1 4. An apparatus as in claim 1 , wherein said error code portion includes 64- 

8 yj bytes of error code data. 

io 1 5 . An apparatus as in claim 1 , wherein said data portion includes 4,096- 

n a bytes of data. 

13 J:: 16. An apparatus as in claim 1, wherein said sectors include 520-bytes of 

14 data storage. 
15 

1 6 1 7 . An apparatus as in claim 1 , wherein said storage block includes 4, 1 60- 

17 bytes of data and error code storage space. 

18 

19 1 8 . An apparatus for protecting a mass storage device from data storage 

20 errors, said mass storage device having a plurality of sectors, said apparatus including 
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1 a plurality of storage blocks, each said storage block including a plurality of 

2 said sectors; 

3 wherein a first subset of each said storage block is responsive to data for said 

4 storage block; 

5 wherein a second subset of each said storage blocks is responsive to error code 

6 information; and 

7 wherein said error code information is responsive to data for a plurality of said 

8 B sectors in each said storage block. 

10 £;1 19. An apparatus as in claim 18, wherein said mass storage device includes 

1 1 y 1 one or more hard disks. 

12 h 

13 U1 20. An apparatus as in claim 1 8, wherein said mass storage device includes 

14 u a RAID storage system. 

15 

16 2 1 . An apparatus as in claim 20, wherein said RAID storage system is a 

17 RAID level 4 system. 

18 

19 22. An apparatus as in claim 1 8, wherein said second subset is appended to 

20 said first subset. 
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1 23. An apparatus as in claim 18, wherein said error code information 

2 includes a checksum of said data for said storage block. 

3 

4 24. An apparatus as in claim 23 , wherein said checksum of said data for 

5 said storage block includes 4-bytes of checksum data. 

6 

7 25. An apparatus as in claim 23, wherein said checksum is a block- 

8*i appended checksum. 

9yi 

10 5 26. An apparatus as in claim 25, wherein said block-appended checksum 

1 1 ' includes a checksum of said block-appended checksum. 

12 O 

13 ^ 27. An apparatus as in claim 26 wherein said checksum of said block- 

14 "" appended checksum includes 4-bytes of data. 

15 

16 28. An apparatus as in claim 1 8 wherein said mass storage device includes a 

17 cache or RAM. 

18 

19 29. An apparatus as in claim 18 wherein said mass storage device includes 

20 one or more hard disks formatted with 520-bytes per sector. 
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1 30. An apparatus as in claim 1 8, wherein said plurality of said sectors is 

2 eight sectors. 

3 

4 31. An apparatus as in claim 1 8, wherein said second subset includes 64- 

5 bytes of error code data. 
6 

7 32. An apparatus as in claim 1 8, wherein said first subset includes 4,096- 

8 O bytes of data. 

9 5 

10 fn 33. An apparatus as in claim 18, wherein said sectors include 520-bytes of 

1 1 in data storage. 

13 ifl 34. An apparatus as in claim 1 8, wherein said first and second subsets 

14 O together include 4,160-bytes of data and error code storage space. 

15 

16 35. A method for protecting data from data storage errors in a mass storage 

17 system, said mass storage system having a plurality of sectors, said method including 

18 determining a plurality of storage blocks, each said storage block including a 

19 plurality of said sectors; 

20 dividing each said storage block into a first subset and a second subset; 
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1 generating error code information responsive to data for a plurality of said 

2 sectors in each said storage block; 

3 wherein said first subset is responsive to data for said storage block; and 

4 wherein said second subset is responsive to error code information. 

5 

6 36. A method as in claim 35, wherein said mass storage system includes 

7 one or more hard disks. 

8 0 

9 ^5 37. A method as in claim 35, wherein said mass storage system includes a 

10 ?n RAID storage system. 

11 Cn 

12 J? 38. A method as in claim 37, wherein said RAID storage system is a RAID 

13 lh level 4 system. 
140 

15 39. A method as in claim 35, wherein said second subset is appended to said 

16 first subset. 

17 

18 40. A method as in claim 35, wherein said error code information includes a 

19 checksum of the said data for said storage block. 

20 
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1 41 . A method as in claim 40, wherein said checksum of said data for said 

2 storage block includes 4-bytes of checksum data. 

3 

4 42. A method as in claim 40, wherein said checksum is a block-appended 

5 checksum. 

6 



7 43 . A method as in claim 42, wherein said block-appended checksum 

8 S includes a checksum of said block-appended checksum. 

9 Hi 

10 S 44. A method as in claim 43, wherein said checksum of said block- 

n s " appended checksum includes 4-bytes of data. 

12 O 

13 j£| 45. A method as in claim 35, wherein said mass storage system includes a 
14*" cache or RAM. 

15 

16 46. A method as in claim 35, wherein said mass storage system includes 

17 one or more hard disks formatted with 520-bytes per sector. 

18 

19 47. A method as in claim 35, wherein said plurality of said sectors is eight 

20 sectors. 
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1 48. A method as in claim 35, wherein said second subset includes 64-bytes 

2 of error code data. 

3 

4 49. A method as in claim 35, wherein said first subset includes 4,096-bytes 

5 of data. 

6 

7 50. A method as in claim 35, wherein said sectors include 520-bytes of data 

SO storage. 

10 ?n SLA method as in claim 35, wherein said first and second subsets together 

n Pi include 4,160-bytes of data and error code storage space. 

121!; 

13 m 52. A method for efficiently detecting data errors in a mass storage system, 

14 O said mass storage system having a plurality of storage blocks composed of a collection of 

15 sectors, including 

16 reading data and error code information located in said storage blocks in a 

17 single operation; 

18 calculating run-time error code information for said data located in storage 

19 blocks; and 

20 comparing said error code information with said run-time error code 
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l information. 

2 

3 53. A method as in claim 52, wherein said mass storage system includes one 

4 or more hard disks. 

5 

6 54. A method as in claim 52, wherein said mass storage system includes a 

7 RAID storage system. 

8 S 

9 ^ 55. A method as in claim 52, wherein said RAID system is a RAID level 4 

10 t f i system. 

n Vl 

12 5 56. A method as in claim 52, wherein said error code information is 

13 W appended to said reading data. 

15 57. A method as in claim 52, wherein said error code information includes a 

1 6 checksum of the said reading data. 

17 

18 58. A method as in claim 57, wherein said checksum of said reading data 

19 includes 4-bytes of checksum data. 

20 
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1 59. A method as in claim 58, wherein said checksum is a block-appended 

2 checksum. 

3 

4 60. A method as in claim 59, wherein said block-appended checksum 

5 includes a checksum of said block-appended checksum. 
6 

7 61. A method as in claim 60, wherein said checksum of said block- 

8 *i appended checksum includes 4-bytes of data. 

io|[? 62. A method as in claim 52, wherein said mass storage system includes a 

n ] cache or RAM. 

12 O 

13 y 63 . A method as in claim 52, wherein said mass storage system includes 

14 ^ one or more hard disks formatted with 520-bytes per sector. 

15 

16 64. A method as in claim 52, wherein said collection of sectors is eight 

17 sectors. 
18 

19 65. A method as in claim 52, wherein said error code information includes 

20 64-bytes of error code data. 
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1 66. A method as in claim 52, wherein said reading data includes 4,096-bytes 

2 of data. 

3 

4 67. A method as in claim 52, wherein said sectors include 520-bytes of data 

5 storage. 
6 

7 68. A method as in claim 52, wherein said reading data and error code 

8 q information together includes 4,160-bytes of data and error code storage space. 

9 V} 

10 ?~ 69. A method as in claim 52, including determining whether said run-time 

1 1 |n error code information and said error code information in said storage blocks are equivalent. 

12 Si 

13 j % 70. A method as in claim 52, including alerting said mass storage system if 

14 O said run-time error code information and said error code information in said storage blocks 

15 are not equivalent. 

16 

17 7 1 . A method as in claim 52, including retrieving said reading data if said 

18 run-time error code information and said error code information in said storage blocks are 

19 equivalent. 

20 
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1 Abstract of the Disclosure 

2 

3 A method and apparatus for a reliable data storage system using block level 

4 checksums appended to data blocks. Files are stored on hard disks in storage blocks, 

5 including data blocks and block-appended checksums. The block-appended checksum 

6 includes a checksum of the data block, a VBN, a DBN, and an embedded checksum for 

7 checking the integrity of the block-appended checksum itself. A file system includes file 

8 f 3 blocks with associated block-appended checksum to the data blocks. The file blocks with 

9 P i block-appended checksums are written to storage blocks. In a preferred embodiment a 

O 

I o collection of disk drives are formatted with 520 bytes of data per sector. For each 4,096-byte 

I I ? r| file block, a corresponding 64-byte block-appended checksum is appended to the file block 
12M" with the first 7 sectors including most of the file block data while the 8 th sector includes the 
13 \ : i remaining file block data and the 64-byte block-appended checksum. 
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